Towards an Annotated Corpus of Discourse Relations in Hindi
نویسندگان
چکیده
We describe our initial efforts towards developing a large-scale corpus of Hindi texts annotated with discourse relations. Adopting the lexically grounded approach of the Penn Discourse Treebank (PDTB), we present a preliminary analysis of discourse connectives in a small corpus. We describe how discourse connectives are represented in the sentence-level dependency annotation in Hindi, and discuss how the discourse annotation can enrich this level for research and applications. The ultimate goal of our work is to build a Hindi Discourse Relation Bank along the lines of the PDTB. Our work will also contribute to the cross-linguistic understanding of discourse connectives.
منابع مشابه
The Hindi Discourse Relation Bank
We describe the Hindi Discourse Relation Bank project, aimed at developing a large corpus annotated with discourse relations. We adopt the lexically grounded approach of the Penn Discourse Treebank, and describe our classification of Hindi discourse connectives, our modifications to the sense classification of discourse relations, and some crosslinguistic comparisons based on some initial annot...
متن کاملExperiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank
In the Hindi Discourse Relation Bank (HDRB) project, we are developing a large corpus annotated with discourse relations, such as causal, temporal, contrastive and conjunctive relations. Adopting the lexically grounded approach of the Penn Discourse Treebank (PDTB), we annotate the argument structure of both explicit and implicit discourse relations, as well as the senses of relations. We descr...
متن کاملThe Penn Discourse TreeBank as a Resource for Natural Language Generation
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse TreeBank (PDTB) can serve as a valuable large scale annotated...
متن کاملThe Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic
We present the first effort towards producing an Arabic Discourse Treebank, a news corpus where all discourse connectives are identified and annotated with the discourse relations they convey as well as with the two arguments they relate. We discuss our collection of Arabic discourse connectives as well as principles for identifying and annotating them in context, taking into account properties...
متن کاملA Corpus Study for Identifying Evidence on Microblogs
Microblogs are a popular way for users to communicate and have recently caught the attention of researchers in the natural language processing (NLP) field. However, regardless of their rising popularity, little attention has been given towards determining the properties of discourse relations for the rapid, large-scale microblog data. Therefore, given their importance for various NLP tasks, we ...
متن کامل